Goto

Collaborating Authors

 probabilistic dynamic model


Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Neural Information Processing Systems

Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g. 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task).


Probabilistic Artificial Intelligence

Krause, Andreas, Hübotter, Jonas

arXiv.org Artificial Intelligence

Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driving cars. In recent years, there have been exciting advances in learning-based, data-driven approaches towards AI, and machine learning and deep learning have enabled computer systems to perceive the world in unprecedented ways. Reinforcement learning has enabled breakthroughs in complex games such as Go and challenging robotics tasks such as quadrupedal locomotion. A key aspect of intelligence is to not only make predictions, but reason about the uncertainty in these predictions, and to consider this uncertainty when making decisions. This is what this manuscript on "Probabilistic Artificial Intelligence" is about. The first part covers probabilistic approaches to machine learning. We discuss the differentiation between "epistemic" uncertainty due to lack of data and "aleatoric" uncertainty, which is irreducible and stems, e.g., from noisy observations and outcomes. We discuss concrete approaches towards probabilistic inference and modern approaches to efficient approximate inference. The second part of the manuscript is about taking uncertainty into account in sequential decision tasks. We consider active learning and Bayesian optimization -- approaches that collect data by proposing experiments that are informative for reducing the epistemic uncertainty. We then consider reinforcement learning and modern deep RL approaches that use neural network function approximation. We close by discussing modern approaches in model-based RL, which harness epistemic and aleatoric uncertainty to guide exploration, while also reasoning about safety.


Reviews: Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Neural Information Processing Systems

This paper describes a model-based reinforcement learning approach which is applied on 4 of the continuous control Mujoco tasks. The approach incorporates uncertainty in the forward dynamics model in two ways: by predicting a Gaussian distribution over future states, rather than a single point, and by training an ensemble of models using different subsets of the agent's experience. As a controller, the authors use the CEM method to generate action sequences, which are then used to generate state trajectories using the stochastic forward dynamics model. Reward sums are computed for each of the action-conditional trajectories, and the action corresponding to the highest predicted reward is executed. This is thus a form of model-predictive control. In their experiments, the authors show that their method is able to match the performance of SOTA model-free approaches using many fewer environment interactions, i.e. with improved sample complexity, for 3 out of 4 tasks.


Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models

Zhang, Lei M., Plappert, Matthias, Zaremba, Wojciech

arXiv.org Artificial Intelligence

We propose a method to predict the sim-to-real transfer performance of RL policies. Our transfer metric simplifies the selection of training setups (such as algorithm, hyperparameters, randomizations) and policies in simulation, without the need for extensive and time-consuming real-world rollouts. A probabilistic dynamics model is trained alongside the policy and evaluated on a fixed set of real-world trajectories to obtain the transfer metric. Experiments show that the transfer metric is highly correlated with policy performance in both simulated and real-world robotic environments for complex manipulation tasks. We further show that the transfer metric can predict the effect of training setups on policy transfer performance.


Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Chua, Kurtland, Calandra, Roberto, McAllister, Rowan, Levine, Sergey

Neural Information Processing Systems

Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g. 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task).